Adaptation of Relation Extraction Rules to New Domains
نویسندگان
چکیده
This paper presents various strategies for improving the extraction performance of less prominent relations with the help of the rules learned for similar relations, for which large volumes of data are available that exhibit suitable data properties. The rules are learned via a minimally supervised machine learning system for relation extraction called DARE. Starting from semantic seeds, DARE extracts linguistic grammar rules associated with semantic roles from parsed news texts. The performance analysis with respect to different experiment domains shows that the data property plays an important role for DARE. Especially the redundancy of the data and the connectivity of instances and pattern rules have a strong influence on recall. However, most real-world data sets do not possess the desirable small-world property. Therefore, we propose three scenarios to overcome the data property problem of some domains by exploiting a similar domain with better data properties. The first two strategies stay with the same corpus but try to extract new similar relations with learned rules. The third strategy adapts the learned rules to a new corpus. All three strategies show that frequently mentioned relations can help in the detection of less frequent relations.
منابع مشابه
Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
Relation Extraction (RE) is the task of extracting semantic relationships between entities in text. Recent studies on relation extraction are mostly supervised. The clear drawback of supervised methods is the need of training data: labeled data is expensive to obtain, and there is often a mismatch between the training data and the data the system will be applied to. This is the problem of domai...
متن کاملRobust Domain Adaptation for Relation Extraction via Clustering Consistency
We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains. We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalanc...
متن کاملBootstrapping relation extraction from semantic seeds
Information Extraction (IE) is a technology for localizing and classifying pieces of relevant information in unstructured natural language texts and detecting relevant relations among them. This thesis deals with one of the central tasks of IE, i.e., relation extraction. The goal is to provide a general framework that automatically learns mappings between linguistic analyses and target semantic...
متن کاملDomain Adaptation for Relation Extraction with Domain Adversarial Neural Network
Relations are expressed in many domains such as newswire, weblogs and phone conversations. Trained on a source domain, a relation extractor’s performance degrades when applied to target domains other than the source. A common yet labor-intensive method for domain adaptation is to construct a target-domainspecific labeled dataset for adapting the extractor. In response, we present an unsupervise...
متن کاملDomain-Neutral Relation Characterisation: Evaluation on Disease-Treatment Data
Adapting conventional supervised relation extraction (RE) systems to new domains requires significant effort from annotators and developers. Thus, we propose models for relation characterisation – the subtask of RE that assigns types to extracted relations – that have domain adaptation costs of zero. Development experiments on newswire text compare dimensionality reduction techniques and show t...
متن کامل